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ABSTRACT 

We present a comprehensive observational study of the gas phase metallicity of star-forming galaxies from 
z ~ — > 3. We combine our new sample of gravitationally lensed galaxies with existing lensed and non- 
lensed samples to conduct a large investigation into the mass-metallicity (MZ) relation at z > 1. We apply a 
self-consistent metallicity calibration scheme to investigate the metallicity evolution of star-forming galaxies 
as a function of redshift. The lensing magnification ensures that our sample spans an unprecedented range of 
stellar mass (3 x 10 7 — 6x 10 10 M©). We find that at the median redshift of z = 2.07, the median metallicity 
of the lensed sample is 0.35 dex lower than the local SDSS star-forming galaxies and 0.18 dex lower than the 
z ~ 0.8 DEEP2 galaxies. We also present the z ~ 2 MZ relation using 19 lensed galaxies. A more rapid 
evolution is seen between z ~ 1 — > 3 than z ~ — > 1 for the high-mass galaxies (10 9 5 M Q <M* <10 n 
M Q ), with almost twice as much enrichment between z ~ 1 — > 3 than between z ~ 1 — >. 0. We compare 
this evolution with the most recent cosmological hydrodynamic simulations with momentum driven winds. We 
find that the model metallicity is consistent with the observed metallicity within the observational error for 
the low mass bins. However, for higher masses, the model over-predicts the metallicity at all redshifts. The 
over-prediction is most significant in the highest mass bin of 10 10 ~ n M Q . 

Subject headings: galaxies: abundances — galaxies: evolution — galaxies: high-redshift — gravitational 
lensing: strong 



1. INTRODUCTION 

Soon after the pristine clouds of primordial gas collapsed 
to assemble a protogalaxy, star formation ensued, leading to 
the production of heavy elements (metals). Metals were syn- 
thesized exclusively in stars, and were ejected into the inter- 
stellar medium (ISM) through stellar winds or supernovae ex- 
plosions. Tracing the heavy element abundance (metallicity) 
in star-forming galaxies provides a "fossil record" of galaxy 
formation and evolution. 

When considered as a closed system, the metal content of a 
galaxy is directly related to the yield and gas fraction (Searle 
& Sargent 1972; Pagel & Patchett 1975; Pagel & Edmunds 
1981; Edmunds 1990). In reality, a galaxy interacts with 
its surrounding intergalactic medium (IGM), hence both the 
overall and local metallicity distribution of a galaxy is mod- 
ified by feedback processes such as galactic winds, inflows, 
and gas accretions (e.g., Lacey & Fall 1985; Edmunds & 
Greenhow 1995; Koppen & Edmunds 1999; Dalcanton 2007). 
Therefore, observations of the chemical abundances in galax- 
ies offer crucial constraints on the star formation history and 
various mechanisms responsible for galactic inflows and out- 
flows. 

The well-known correlation between galaxy mass (lumi- 
nosity) and metallicity was first proposed by Lequeux et al. 
(1979). Subsequent studies confirmed the existence of the 
luminosity-metallicity (LZ) relation (e.g., Rubin et al. 1984; 
Skillman et al. 1989; Zaritsky et al. 1994; Garnett 2002). 
Luminosity was used as a proxy for stellar mass in these 
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studies as luminosity is a direct observable. Aided by new 
sophisticated stellar population models, stellar mass can be 
robustly calculated and a tighter correlation is found in the 
mass-metallicity (MZ) relation. Tremonti et al. (2004) have 
established the MZ relation for local star-forming galaxies 
based on - 5xl0 5 Sloan Digital Sky Survey (SDSS) galax- 
ies. At intermediate redshifts (0.4 < z < 1), the MZ relation 
has also been observed for a large number of galaxies (>100) 
(e.g., Savaglio et al. 2005; Cowie & Barger 2008; Lamareille 
et al. 2009). Zahid et al. (201 1) derived the MZ relation for ~ 
10 3 galaxies from the Deep Extragalactic Evolutionary Probe 
2 (DEEP2) survey, validating the MZ relation on a statistically 
significant level at z ~ 0.8. 

Current cosmological hydrodynamic simulations and semi- 
analytical models can predict the metallicity history of galax- 
ies on a cosmic timescale (Nagamine et al. 2001; De Lucia 
et al. 2004; Bertone et al. 2007; Brooks et al. 2007; Dave & 
Oppenheimer2007; Dave et al. 201 la,b). These models show 
that the shape of the MZ relation is particularly sensitive to the 
adopted feedback mechanisms. The cosmological hydrody- 
namic simulations with momentum-driven winds models pro- 
vide better match with observations than energy-driven wind 
models (Oppenheimer & Dave 2008; Finlator & Dave 2008; 
Dave et al. 2011a). However, these models have not been 
tested thoroughly in observations, especially at high redshifts 
(z > 1), where the MZ relation is still largely uncertain. 

As we move to higher redshifts, selection effects and small 
number statistics haunt observational metallicity history stud- 
ies. The difficulty becomes more severe in the so-called "red- 
shift desert" (1 < z < 3), where the metallicity sensitive op- 
tical emission lines have shifted to the sky-background dom- 
inated near infrared (NIR). Ironically, this redshift range har- 
bors the richest information about galaxy evolution. It is dur- 
ing this redshift period (~ 2—6 Gyrs after the Big Bang) that 
the first massive structures condensed; the star formation rate 
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(SFR), major merger activity, and black hole accretion rate 
peaked; much of today's stellar mass was assembled, and 
heavy elements were produced (Fan et al. 2001; Dickinson 
et al. 2003; Chapman et al. 2005; Hopkins & Beacom 2006; 
Grazian et al. 2007; Conselice et al. 2007; Reddy et al. 2008). 
It is therefore of crucial importance to explore NIR spectra for 
galaxies in this redshift range. 

Many spectroscopic redshift surveys have been carried out 
to study star-forming galaxies at z >1 in recent years (e.g., 
Steidel et al. 2004; Law et al. 2009). However, due to the 
low efficiency in the NIR, those spectroscopic surveys al- 
most inevitably have to rely on color-selection criteria and the 
biases in UV-selected galaxies tend to select the most mas- 
sive and less dusty systems (e.g., Capak et al. 2004; Steidel 
et al. 2004; Reddy et al. 2006). Space telescopes can observe 
much deeper in the NIR and are able to probe a wider mass 
range. For example, the narrow-band Ha surveys based on 
the new WFC3 camera aboard the Hubble Space Telescope 
(HST) have located hundreds of Ha emitters up to z = 2.23, 
finding much fainter systems than observed from the ground 
(Sobral et al. 2009). However, the low-resolution spectra from 
the narrow band filters forbid derivations of physical proper- 
ties such as metallicities that can only currently be acquired 
from ground-based spectral analysis. 

Thanks to the advent of long-slit/multi-slit NIR spectro- 
graphs on 8 — 10 meter class telescopes, enormous progress 
has been made in the last decade to capture galaxies in the 
redshift desert. For chemical abundance studies, a full cov- 
erage of rest-frame optical spectra (4000-9000A) is usually 
mandatory for the most robust diagnostic analysis. For 1.5 
< z < 3, the rest-frame optical spectra have shifted into 
the J, H, and K bands. It remains challenging and observa- 
tionally expensive to obtain high signal-to-noise (S/N) NIR 
spectra from the ground, especially for "typical" targets at 
high-z that are less massive than conventional color-selected 
galaxies. Therefore, previous investigations into the metallic- 
ity properties between 1 < z < 3 focused on stacked spec- 
tra, samples of massive luminous individual galaxies, or very 
small numbers of lower-mass galaxies (e.g., Erb et al. 2006; 
Forster Schreiber et al. 2006; Law et al. 2009; Erb et al. 2010; 
Yabe et al. 2012). 

The first mass-metallicity (MZ) relation for galaxies at z ~ 
2 was found by Erb et al. (2006) using the stacked spectra 
of 87 UV selected galaxies divided into 6 mass bins. Sub- 
sequently, mass and metallicity measurements have been re- 
ported for numerous individual galaxies at 1.5 < z < 3 
(Forster Schreiber et al. 2006; Genzel et al. 2008; Hayashi 
et al. 2009; Law et al. 2009; Erb et al. 2010). These galaxies 
are selected using broadband colors in the UV (Lyman Break 
technique; Steidel et al. 1996, 2003) or using B, z, and K- 
band colors (BzK selection; Daddi et al. 2004). The Lyman 
break and BzK selection techniques favor galaxies that are lu- 
minous in the UV or blue and may therefore be biased against 
low luminosity (low-metallicity) galaxies, and dusty (poten- 
tially metal-rich) galaxies. Because of these biases, galaxies 
selected in this way may not sample the full range in metal- 
licity at redshift z > 1 . 

A powerful alternative method to avoid these selection ef- 
fects is to use strong gravitationally lensed galaxies. In the 
case of galaxy cluster lensing, the total luminosity and area of 
the background sources can easily be boosted by ~ 10 — 50 
times, providing invaluable opportunities to obtain high S/N 
spectra and probe intrinsically fainter systems within a rea- 
sonable amount of telescope time. In some cases, sufficient 



S/N can even be obtained for spatially resolved pixels to study 
the resolved metallicity of high-z galaxies (Swinbank et al. 
2009; Jones et al. 2010; Yuan et al. 2011; Jones et al. 2012). 
Before 201 1, metallicities have been reported for a handful of 
individually lensed galaxies using optical emission lines at 1 .5 
< z < 3 (Pettini et al. 2001; Lemoine-Busserolle et al. 2003; 
Stark et al. 2008; Quider et al. 2009; Yuan & Kewley 2009; 
Jones et al. 2010). Fortunately, lensed galaxy samples with 
metallicity measurements have increased significantly thanks 
to reliable lensing mass modeling and larger dedicated spec- 
troscopic surveys of lensed galaxies on 8-10 meter telescopes 
(Richard et al. 2011; Wuyts et al. 2012; Christensen et al. 
2012). 

In 2008, we began a spectroscopic observational survey de- 
signed specifically to capture metallicity sensitive lines for 
lensed galaxies. Taking advantage of the multi-object cryo- 
genic NIR spectrograph (MOIRCS) on Subaru, we targeted 
well-known strong lensing galaxy clusters to obtain metallic- 
ities for galaxies between 0.8 < z < 3. In this paper, we 
present the first metallicity measurement results from our sur- 
vey. 

Combining our new data with existing data from the litera- 
ture, we present a coherent observational picture of the metal- 
licity history and mass-metallicity evolution of star-forming 
galaxies from z ~ to z ~ 3. Kewley & Ellison (2008) have 
shown that the metallicity offsets in the diagnostic methods 
can easily exceed the intrinsic trends. It is of paramount im- 
portance to make sure that relative metallicities are compared 
on the same metallicity calibration scale. In MZ relation stud- 
ies, the methods used to derive the stellar mass can also cause 
systematic offsets (Zahid et al. 2011). Different SED fitting 
codes can yield a non-negligible mass offset, hence mimick- 
ing or hiding evolution in the MZ relation. In this paper, we 
derive the mass and metallicity of all samples using the same 
methods, ensuring that the observational data are compared in 
a self-consistent way. We compare our observed metallicity 
history with the latest prediction from cosmological hydrody- 
namical simulations. 

Throughout this paper we use a standard ACDM cosmology 
with H = 70 km s" 1 Mpc~\ O M =0.30, and fi A =0.70. We 
use solar oxygen abundance 12 + log(O/H) =8.69 (Asplund 
et al. 2009). 

The paper is organized as follows: Section 2 describes our 
lensed sample survey and observations. Data reduction and 
analysis are summarized in Section 3. Section 4 presents an 
overview of all the samples we use in this study. Section 5 de- 
scribes the methodology of derived quantities. The metallicity 
evolution of star-forming galaxies with redshift is presented in 
Section 6. Section 7 presents the mass-metallicity relation for 
our lensed galaxies. Section 8 compares our results with pre- 
vious work in literature. Section 9 summarizes our results. In 
the Appendix, we show the morphology, slit layout, and re- 
duced ID spectra for the lensed galaxies reported in our sur- 
vey. 

2. THE LEGMS SURVEY AND OBSERVATIONS 

2.1. The Lensed Emission-Line Galaxy Metallicity Survey 
(LEGMS) 

Our survey (LEGMS) aims to obtain oxygen abundance 
of lensed galaxies at 0.8<z<3. LEGMS has taken enor- 
mous advantage of the state-of-the-art instruments on Mauna 
Kea. Four instruments have been utilized so far: (1) the 
Multi-Object InfraRed Camera and Spectrograph (MOIRCS; 
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TABLE 1 
MOIRCS Observation Summary 



Target 


Dates 


Exposure Time 


PA 


Seeing(AT s ) 


Slit width 


Filter/Grism 






(ks) 


(deg) 


// 


it 




Abell 1689 


Apr 28,2011 


50.0 


60 


0.5-0.8 




K s Imaging 


Abell 1689 


Apr 28,2010 


15.6 


-60 


0.5-0.8 


0.8 


HK500 


Abell 1689 


Apr 29,2010 


19.2 


45 


0.5-0.6 


0.8 


HK500 


Abell 1689 


Mar 24,2010 


16.8 


20 


0.5-0.6 


0.8 


HK500 


Abell 1689 


Mar 25,2010 


12.0 


-20 


0.6-0.7 


0.8 


HK500 


Abell 1689 


Apr 23, Mar 24, 2008 


15.6 


60 


0.5-0.8 


0.8 


zJ500 


Abell 68 


Sep 29-30,2009 


12.0 


60 


0.6-1.0 


1.0 


HK500, zJ500 



NOTE. — Log of the observations. We use a dithering length of 27 5 for all the spectroscopic observations. 



Ichikawa et al. 2006) on Subaru; (2) the OH-Suppressing 
Infra-Red Imaging Spectrograph (OSIRIS; Larkin et al. 2006) 
on Keck II; (3) the Near Infrared Spectrograph (NIRSPEC; 
McLean et al. 1998) on Keck II; (4) the Low Dispersion Imag- 
ing Spectrograph (LRIS; Oke et al. 1995) on Keck I. The sci- 
entific objective of each instrument is as follows: MOIRCS 
is used to obtain the NIR images and spectra for multiple 
targets behind lensing clusters; NIRSPEC is used to cap- 
ture occasional single field lensed targets (especially galaxy- 
scale lenses); LRIS is used to obtain the [O II] A3727 to 
[Olll] A5007 spectral range for targets with z < 1.5. From 
the slit spectra, we select targets that are have sufficient fluxes 
and angular sizes to be spatially resolved with OSIRIS. In this 
paper we focus on the MOIRCS observations of the lensing 
cluster Abell 1689 for targets between redshifts 1.5 < z < 3. 
Observations for other clusters are ongoing and will be pre- 
sented in future papers. 

The first step to construct a lensed sample for slit spec- 
troscopy is to find the lensed candidates (arcs) that have 
spectroscopic redshifts from optical surveys. The number of 
known spectroscopically identified lensed galaxies at z > 1 
is still on the order of a few tens. The limited number of 
lensed candidates makes it impractical to build a sample that 
is complete and well defined in mass. A mass complete sam- 
ple is the future goal of this project. Our strategy for now is 
to observe as many arcs with known redshifts as possible. If 
we assume the AGN fraction is similar to local star-forming 
galaxies, then we expect ~ 10% of our targets to be AGN 
dominated (Kewley et al. 2004). Naturally, lensed sample is 
biased towards highly magnified sources. However, because 
the largest magnifications are not biased towards intrinsically 
bright targets, lensed samples are less biased towards the in- 
trinsically most luminous galaxies. 

Abell 1689 is chosen as the primary target for MOIRCS 
observations because it has the largest number (~ 100 arcs, or 
~ 30 source galaxies) of spectroscopically identified lensed 
arcs (Broadhurst et al. 2005; Frye et al. 2007; Limousin et al. 
2007). 

Multi-slit spectroscopy of NIR lensing surveys greatly en- 
hances the efficiency of spectroscopy of lensed galaxies in 
clusters. Theoretically, ~ 40 slits can be observed simultane- 
ously on the two chips of MOIRCS with a total field of view 
(FOV) of 4' x 7'. In practice, the number of lensed targets 
on the slits is restricted by the strong lensing area, slit orien- 
tations, and spectral coverage. For A1689, the lensed candi- 
dates cover an area of ~2' x 2', well within the FOV of one 
chip. We design slit masks for chip 2, which has better sensi- 
tivity and less bad pixels than chip 1 . There are ~ 40 lensed 
images (~ 25 individual galaxies) that fall in the range of 1.5 



< z < 3 in our slit masks. We use the MOIRCS low-resolution 
(R^ 500) grisms which have a spectral coverage of 0.9 -1.78 
/xm in ZJ and 1.3-2.5 fim in HK. To maximize the detection 
efficiency, we give priority to targets with the specific redshift 
range such that all the strong emission lines from [O II] A3727 
to [N II] A6584 can be captured in one grism configuration. 
For instance, the redshift range of 2.5 < z < 3 is optimized 
for the HK500 grism, and 1.5 < z < 1.7 is optimized for the 
ZJ500 grism. 

From UT March 2008 to UT April 2010, we used 8 
MOIRCS nights (6 usable nights) with 4 position angles (PAs) 
and 6 masks to observe 25 galaxies. Metallicity quality spec- 
tra were obtained for 12 of the 25 targets. We also include one 
z > 1.5 galaxy from our observations of Abell 68 5 . The PA is 
chosen to optimize the slit orientation along the targeted arcs' 
elongated directions. For arcs that are not oriented to match 
the PA, the slits are configured to center on the brightest knots 
of the arcs. We use slit widths of 0.8" and 1.0", with a variety 
of slit lengths for each lensed arc. For each mask, a bright 
galaxy/star is placed on one of the slits to trace the slit cur- 
vature and determine the offsets among individual exposures. 
Typical integrations for individual frames are 400 s, 600 s, 
and 900 s, depending on levels of skyline saturation. We use 
an ABBA dithering sequence along the slit direction, with a 
dithering length of 2"5. The observational logs are summa- 
rized in Table 1 . 

3. DATA REDUCTION AND ANALYSIS 

3.1. Reduce 1 D spectrum 

The data reduction procedures from the raw mask data to 
the final wavelength and flux calibrated ID spectra were real- 
ized by a set of IDL codes called MOIRCSMOSRED. The 
codes were scripted originally by Youichi Ohyama. T.-T 
Yuan extended the code to incorporate new skyline subtrac- 
tion (e.g., Henry et al. 2010, for a description of utilizing 
MOIRCSMOSRED). 

We use the newest version (Apr, 2011) of MOIRC- 
SMOSRED to reduce the data in this work. The sky sub- 
traction is optimized as follows. For each A, frame, we sub- 
tract a sky frame denoted as a((Bj_i+B, + i)/2), where Bj_i 
and Bj + i are the science frames before and after the Ai expo- 
sure. The scale parameter a is obtained by searching through 
a parameter range of 0.5-2.0, with an increment of 0.0001. 
The best a is obtained where the root mean square (RMS) of 
the residual R= Aj- ct((Bi_i+B i+ i)/2) is minimal for a user 

5 Most of the candidates in A68 are at z < 1. Due to the low spectral 
resolution in this observation, Ha and [Nil] are not resolved at z < 1. We 
do not have sufficient data to obtain reliable metallicities for the z < 1 targets 
in A68 and therefore exclude them from this study. 



4 



defined wavelength region Ai and A2. We find that this sky 
subtraction method yields smaller sky OH line residuals (~ 
20%) than conventional A-B methods. We also compare with 
other skyline subtraction methods in literature (Kelson 2003; 
Davies 2007). We find the sky residuals from our method 
are comparable to those from the Kelson (2003) and Davies 
(2007) methods within 5% in general cases. However, in 
cases where the emission line falls on top of a strong skyline, 
our method is more stable and improves the skyline residual 
by ~ 10% than the other two methods. 

Wavelength calibration is carried out by identifying sky- 
lines for the ZJ grism. For the HK grism, we use argon lines to 
calibrate the wavelength since only a few skylines are avail- 
able in the HK band. The argon-line calibrated wavelength 
is then re-calibrated with the available skylines in HK to de- 
termine the instrumentation shifts between lamp and science 
exposures. Note that the RMS of the wavelength calibration 
using a 3rd order polynomial fitting is <~ 10-20 A, correspond- 
ing to a systematic redshift uncertainty of 0.006. 

A sample of AO stars selected from the UKIRT photomet- 
ric standards were observed at similar airmass as the targets. 
These stars were used for both telluric absorption corrections 
and flux calibrations. We use the prescriptions of Erb et al. 
(2003) for flux calibration. As noted in Erb et al. (2003), 
the absolute flux calibration in the NIR is difficult with typ- 
ical uncertainties of ^20%. We note that this uncertainty is 
even larger for lensed samples observed in multi-slits because 
of the complicated aperture effects. The uncertainties in the 
flux calibration are not a concern for our metallicity analysis 
where only line ratios are involved. However, these errors are 
a major concern for calculating SFRs. The uncertainties from 
the multi-slit aperture effects can cause the SFRs to change 
by a factor of 2-3. For this reason, we refrain from any quan- 
titative analysis of SFRs in this work. 

3.2. Line Fitting 

The emission lines are fitted with Gaussian profiles. For 
the spatially unresolved spectra, the aperture used to extract 
the spectrum is determined by measuring the Gaussian profile 
of the wavelength collapsed spectrum. Some of the lensed 
targets (~ 10%) are elongated and spatially resolved in the 
slit spectra, however, because of the low surface brightness 
and thus very low S/N per pixel, we are unable to obtain us- 
able spatially resolved spectra. For those targets, we make 
an initial guess for the width of the spatial profile and force a 
Gaussian fit, then we extract the integrated spectrum using the 
aperture determined from the FWHM of the Gaussian profile. 

For widely separated lines such as [O II] A3727, H/3 A4861, 
single Gaussian functions are fitted with 4 free parameters: 
the centroid (or the redshift), the line width, the line flux, and 
the continuum. The doublet [O III] A A4959,5007 are initially 
fitted as a double Gaussian function with 6 free parameters: 
the centroids 1 and 2 , line widths 1 and 2, fluxes 1 and 2, 
and the continuum. In cases where the [O III] A4959 line is 
too weak, its centroid and line velocity width are fixed to be 
the same as [Olll] A5007 and the flux is fixed to be 1/3 of 
the [Olll]A5007 line (Osterbrock 1989). A triple-Gaussian 
function is fitted simultaneously to the three adjacent emis- 
sion lines: [N II] A6548, 6583 and Ha. The centroid and ve- 
locity width of [N II] A6548, 6583 lines are constrained by the 
velocity width of Ha A6563, and the ratio of [N II] A6548 and 
[N II] A6583 is constrained to be the theoretical value of 1/3 
given in Osterbrock (1989). The line profile fitting is con- 
ducted using a % 2 minimization procedure which uses the in- 



verse of the sky OH emission as the weighting function. The 
S/N per pixel is calculated from the \ 2 of the fitting. The mea- 
sured emission line fluxes and line ratios are listed in Table 4. 
The final reduced ID spectra are shown in the Appendix. 

3.3. Lensing Magnification 

Because the lensing magnification (/1) is not a direct func- 
tion of wavelength, line ratio measurements do not require 
pre-knowledge of the lensing magnification. However, /i is 
needed for inferring other physical properties such as the in- 
trinsic fluxes, masses and source morphologies. Paramet- 
ric models of the mass distribution in the clusters Abell 68 
and Abell 1689 were constructed using the Lenstool software 
Lenstool 6 (Kneib et al. 1993; Julio et al. 2007). The best- 
fit models have been previously published in Richard et al. 
(2007) and Limousin et al. (2007). As detailed in Limousin 
et al. (2007), Lenstool uses Bayesian optimization with 
a Monte-Carlo Markov Chain (MCMC) sampler which pro- 
vides a family of best models sampling the posterior proba- 
bility distribution of each parameter. In particular, we use this 
family of best models to derive the magnification and rela- 
tive error on magnification /i associated to each lensed source. 
Typical errors on fi are ^10% for Abell 1689 and Abell 68. 

3.4. Photometry 

We determine the photometry for the lensed galaxies in 
A 1689 using 4-band HST imaging data, 1-band MOIRCS 
imaging data, and 2-channel Spitzer IRAC data at 3.6 and 4.5 
fim. 

We obtained a 5,000 s image exposure for A1689 on the 
MOIRCS K s filter, at a depth of 24 mag, using a scale of 
0.117" per pixel. The image was reduced using MCSRED 
in IRAF written by the MOIRCS supporting astronomer Ichi 
Tanaka 7 . The photometry is calibrated using the 2MASS stars 
located in the field. 

The ACS F475W, F625W, F775W, F850LP data are ob- 
tained from the HST archive. The HST photometry are de- 
termined using SExtractor (Bertin & Arnouts 1996) with pa- 
rameters adjusted to detect the faint background sources. The 
F775W filter is used as the detection image using a l'.'O aper- 
ture. 

The IRAC data are obtained from the Spitzer archive and 
are reduced and drizzled to a pixel scale of Cf/6 pixel -1 . In 
order to include the IRAC photometry, we convolved the HST 
and MOIRCS images with the IRAC point spread functions 
(PSFs) derived from unsaturated stars. All photometric data 
are measured using a 3'.'0 radius aperture. Note that we only 
consider sources that are not contaminated by nearby bright 
galaxies: ~ 70% of our sources have IRAC photometry (Ta- 
ble 5). Typical errors for the IRAC band photometry are 0.3 
mag, with uncertainties mainly from the aperture correction 
and contamination of neighboring galaxies. Typical errors for 
the ACS and MOIRCS bands are 0. 15 mag, with uncertainties 
mainly from the Poisson noise and absolute zero-point uncer- 
tainties (Wuyts et al. 2012). We refer to Richard et al. (2012, 
in prep) for the full catalog of the lensing magnification and 
photometry of the lensed sources in Abell 1689. 

4. SUPPLEMENTARY SAMPLES 

In addition to our lensed targets observed in LEGMS, we 
also include literature data for complementary lensed and 

6 http : //www . oamp . fr /cosmology /lenstool 
7 http://www.naoj.org/staff/ichi/MCSRED/mcsred.html 
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non-lensed samples at both local and high-z. The observa- 
tional data for individually measured metallicities at z > 1.5 
are still scarce and caution needs to be taken when using them 
for comparison. The different metallicity and mass derivation 
methods used in different samples can give large systematic 
discrepancies and provide misleading results. For this rea- 
son, we only include the literature data that have robust mea- 
surements and sufficient data for consistently recalculating the 
stellar mass and metallicities using our own methods. Thus, 
in general, stacked data, objects with lower/upper limits in ei- 
ther line ratios or masses are not chosen. The one exception 
is the stacked data of Erb et al. (2006), as it is the most widely 
used comparison sample at z ~ 2. 
The samples used in this work are: 

(1 ) The Sloan Digital Sky Survey (SDSS) sample (z ~ 
0.07). We use the SDSS sample (Abazajian et al. 2009, 
http://www.mpa-garching.mpg.de/SDSS/DR7/) defined by 
Zahid et al. (2011). The mass derivation method used in 
Zahid et al. (2011) is the same as we use in this work. 
All SDSS metallicities are recalculated using the PP04N2 
method, which uses an empirical fit to the [N II] and Ha line 
ratios of H II regions (Pettini & Pagel 2004). 

(2) The The Deep Extragalactic Evolutionary Probe 2 
(DEEP2) sample (z ~ 0.8). The DEEP2 sample (Davis 
et al. 2003, http://www.deep.berkeley.edu/DR3/) is defined 
in Zahid et al. (2011). At z ~ 0.8, the [Nil] and Ha lines 
are not available in the optical. We convert the KK04 R23 
metallicity to the PP04N2 metallicity using the prescriptions 
of Kewley & Ellison (2008). 

(3) The UV-selected sample (z ~ 2). We use the stacked 
data of Erb et al. (2006). The metallicity diagnostic used by 
Erb et al. (2006) is the PP04N2 method and no recalculation 
is needed. We offset the stellar mass scale of Erb et al. (2006) 
by -0.3 dex to match the mass derivation method used in 
this work (Zahid et al. 2012). This offset accounts for the 
different initial mass function (IMF) and stellar evolution 
model parameters applied by Erb et al. (2006). 

(4) The lensed sample (1 < z < 3). Besides the 11 lensed 
galaxies from our LEGMS survey in Abell 1689, we include 
1 lensed source (z =1.762) from our MOIRCS data on 
Abell 68 and 1 lensed spiral (z =1.49) from Yuan et al. 

(201 1) . We also include 10 lensed galaxies from Wuyts et al. 

(2012) and 3 lensed galaxies from Richard et al. (2011), 
since these 13 galaxies have [Nil] and Ha measurements, 
as well as photometric data for recalculating stellar masses. 
We require all emission lines from literature to have S/N 
> 3 for quantifying the metallicity of 1 < z < 3 galaxies. 
Upper-limit metallicities are found for 6 of the lensed targets 
from our LEGMS survey. Altogether, the lensed sample is 
composed of 25 sources, 12 (6/12 upper limits) of which are 
new observations from this work. Upper-limit metallicities 
are not used in our quantitative analysis. 

The methods used to derive stellar mass and metallicity are 
discussed in detail in Section 5. 

5. DERIVED QUANTITIES 

5.1. Optical Classification 

We use the standard optical diagnostic diagram (BPT) to 
exclude targets that are dominated by AGN (Baldwin et al. 
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FIG. 1 . — Left panel: the metallicity distribution of the local SDSS (blue), 
intermediate-z DEEP2 (black), and high-z lensed galaxy samples (red). 
Right panel: the stellar mass distribution of the same samples. To present all 
three samples on the same figure, the SDSS (20577 points) and DEEP2 (1635 
points) samples are normalized to 500, and the lensed sample (25 points) is 
normalized to 100. 

1981; Veilleux & Osterbrock 1987; Kewley et al. 2006). For 
all 26 lensed targets in our LEGMS sample, we find 1 tar- 
get that could be contaminated by AGN (B8.2). The fraction 
of AGN in our sample is therefore ~8%, which is similar to 
the fraction (~7%) of the local SDSS sample (Kewley et al. 
2006). We also find that the line ratios of the high-z lensed 
sample has a systematic offset on the BPT diagram, as found 
in Shapley et al. (2005); Erb et al. (2006); Kriek et al. (2007); 
Brinchmann et al. (2008); Liu et al. (2008); Richard et al. 
(2011). The redshift evolution of the BPT diagram will be 
reported in Kewley et al (2013, in preparation). 

5.2. Stellar Masses 

We use the software LE PHARE 8 (Ilbert et al. 2009) to de- 
termine the stellar mass. LE PHARE is a photometric red- 
shift and simulation package based on the population synthe- 
sis models of Bruzual & Chariot (2003). If the redshift is 
known and held fixed, LE PHARE finds the best fitted SED 
on a x 2 minimization process and returns physical parameters 
such as stellar mass, SFR and extinction. We choose the ini- 
tial mass function (IMF) by Chabrier (2003) and the Calzetti 
et al. (2000) attenuation law, with E(B — V) ranging from 
to 2 and an exponentially decreasing SFR (SFR oc e~*/ T ) with 
t varying between and 13 Gyrs. The errors caused by emis- 
sion line contamination are taken into account by manually 
increasing the uncertainties in the photometric bands where 
emission lines are located. The uncertainties are scaled ac- 
cording to the emission line fluxes measured by MOIRCS. 
The stellar masses derived from the emission line corrected 
photometry are consistent with those without emission line 
correction, albeit with larger errors in a few cases (~ 0.1 dex 
in log space). We use the emission-line corrected photometric 
stellar masses in the following analysis. 

5.3. Metallicity Diagnostics 

The abundance of oxygen (12 + log(0/H)) is used as a 
proxy for the overall metallicity of H II regions in galaxies. 
The oxygen abundance can be inferred from the strong re- 

8 www. cf ht.hawaii.edu/~arnouts/ LEPH ARE / lephare.html 
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FIG. 2. — The Zz plot: metallicity history of star-forming galaxies from redshift to 3. The SDSS and DEEP2 samples (black dots) are taken from Zahid 
et al. (201 1). The SDSS data are plotted in bins to reduce visual crowdedness. The lensed galaxies are plotted in blue (upper-limit objects in green arrows), with 
different lensed samples showing in different symbols (see Figure 6 for the legends of the different lensed samples). The purple "bowties" show the bootstrapping 
mean (filled symbol) and median (empty symbol) metalhcities and the lcr standard deviation of the mean and median, whereas the orange dashed error bars 
show the lcr scatter of the data. For the SDSS and DEEP2 samples, the lcr errors of the median metalhcities are 0.001 and 0.006 (indiscernible from the 
figure), whereas for the lensed sample the lcr scatter of the median metallicity is 0.067. Upper limits are excluded from the median and error calculations. For 
comparison, we also show the mean metallicity of the UV-selected galaxies from Erb et al. (2006) (symbol: the black bowtie). The 6 panels show samples in 
different mass ranges. The red dotted and dashed lines are the model predicted median and lcr scatter (defined as including 68% of the data) of the SFR-weighted 
gas metallicity in simulated galaxies (Dave et al. 2011b). 



combination lines of hydrogen atoms and collisionally ex- 
cited metal lines (e.g., Kewley & Dopita 2002). Before do- 
ing any metallicity comparisons across different samples and 
redshifts, it is essential to convert all metallicities to the same 



base calibration. The discrepancy among different diagnos- 
tics can be as large as 0.7 dex for a given mass, large enough 
to mimic or hide any intrinsic observational trends. Kewley & 
Ellison (2008) (KE08) have shown that both the shape and the 



7 



amplitude of the MZ relation change substantially with differ- 
ent diagnostics. For this work, we convert all metallicities to 
the PP04N2 method using the prescriptions from KE08. 

For our lensed targets with only [N II] and Ha, we use the 
N2 = log([Nn] A6583/Ha) index, as calibrated by Pettini & 
Pagel (2004) (the PP04N2 method). All lines are required 
to have S/N>3 for reliable metallicity estimations. Lines 
that have S/N<3 are presented as 3-er upper limits. For tar- 
gets with only [O II] to [O III] lines, we use the indicator 
R 2 3 = ([On] A3727 + [Om] AA4959, 5007)/H/3 to calculate 
metallicity. The formalization is given in Kobulnicky & Kew- 
ley (2004) (KK04 method). The upper and lower branch de- 
generacy of R23 can be broken by the value/upper limit of 
[N Il]/Ha. If the upper limit of [N Il]/Ha is not sufficient or 
available to break the degeneracy, we calculate both the up- 
per and lower branch metallicities and assign the statistical 
errors of the metallicities as the range of the upper and lower 
branches. The KK04 R 2 3 metallicity is then converted to the 
PP04N2 method using the KE08 prescriptions. The line fluxes 
and metallicity are listed in Table 4. For the literature data, we 
have recalculated the metallicities in the PP04N2 scheme. 

The statistical metallicity uncertainties are calculated by 
propagating the flux errors of the [N II] and Ha lines. The 
metallicity calibration of the PP04N2 method itself has a la 
dispersion of 0. 1 8 dex (Pettini & Pagel 2004; Erb et al. 2006). 
Therefore, for individual galaxies that have statistical metal- 
licity uncertainties of less than 0.18 dex, we assign errors of 
0.18 dex. 

Note that we are not comparing absolute metallicities be- 
tween galaxies as they depend on the accuracy of the calibra- 
tion methods. However, by re-calculating all metallicities to 
the same calibration diagnostic, relative metallicities can be 
compared reliably. The systematic error of relative metallici- 
ties is < 0.07 dex for strong-line methods (Kewley & Ellison 
2008). 

6. THE COSMIC EVOLUTION OF METALLICITY FOR 
STAR-FORMING GALAXIES 

6.1. The Zz Relation 

In this section, we present the observational investiga- 
tion into the cosmic evolution of metallicity for star-forming 
galaxies from redshift to 3. The metallicity in the local 
universe is represented by the SDSS sample (20577 objects, 
(z) = 0.072 ± 0.016). The metallicity in the intermediate- 
redshift universe is represented by the DEEP2 sample (1635 
objects, (z) = 0.78 ± 0.02). For redshift 1 < z < 3, 
we use 19 lensed galaxies (plus 6 upper limit measurements) 
((z) = 1.91 ± 0.61) to infer the metallicity range. 

The redshift distributions for the SDSS and DEEP2 sam- 
ples are very narrow (Az ~ 0.02), and the mean and me- 
dian redshifts are identical within 0.001 dex. Whereas for 
the lensed sample, the median redshift is 2.07, and is 0.16 
dex higher than the mean redshift. There are two z <~ 0.9 
objects in the lensed sample, and if these two objects are ex- 
cluded, the mean and median redshifts for the lensed sample 
are (z) — 2.03 ± 0.54, z me di an — 2.09 (see Table 2). 

The overall metallicity distributions of the SDSS, DEEP2, 
and lensed samples are shown in Figure 1 . Since the z > 1 
sample size is 2-3 orders of magnitude smaller than the z < 1 
samples, we use a bootstrapping process to derive the mean 
and median metallicities of each sample. Assuming the mea- 
sured metallicity distribution of each sample is representative 
of their parent population, we draw from the initial sample a 
random subset and repeat the process for 50000 times. We use 



the 50000 replicated samples to measure the mean, median 
and standard deviations of the initial sample. This method 
prevents artifacts from small-number statistics and provides 
robust estimation of the median, mean and errors, especially 
for the high-z lensed sample. 

The fraction of low-mass (M* < 10 9 M Q ) galaxies is largest 
(31%) in the lensed sample, compared to 9% and 5% in the 
SDSS and DEEP2 samples respectively. Excluding the low- 
mass galaxies does not notably change the median metallicity 
of the SDSS and DEEP2 samples (~ 0.01 dex), while it in- 
creases the median metallicity of the lensed sample by ~ 0.05 
dex. To investigate whether the metallicity evolution is differ- 
ent for various stellar mass ranges, we separate the samples in 
different mass ranges and derive the mean and median metal- 
licities (Table 2). The mass bins of 10 9 M <M* <10 9 5 M 
and 10 9 - 5 M <M* <10 n M are chosen such that there 
are similar number of lensed galaxies in each bin. Alterna- 
tively, the mass bins of 10 9 M <M* <10 10 M and 10 10 
M <M* <10 n M are chosen to span equal mass scales. 

We plot the metallicity (Z) of all samples as a function of 
redshift z in Figure 2 (dubbed the Zz plot hereafter). The 
first panel shows the complete observational data used in this 
study. The following three panels show the data and model 
predictions in different mass ranges. The samples at local and 
intermediate redshifts are large enough such that the la er- 
rors of the mean and median metallicity are smaller than the 
symbol sizes on the Zz plot (0.001-0.006 dex). Although the 
z > 1 samples are still composed of a relatively small number 
of objects, we suggest that the lensed galaxies and their boot- 
strapped mean and median values more closely represent the 
average metallicities of star-forming galaxies at z > 1 than 
Lyman break, or B-band magnitude limited samples because 
the lensed galaxies are selected based on magnification rather 
than colors. Although we do note that there is still a magni- 
tude limit and flux limit for each lensed galaxy. 

We derive the metallicity evolution in units of "dex per red- 
shift" and "dex per Gyr" using both the mean and median 
values. The metallicity evolution can be characterized by the 
slope (^|) of the Zz plot. We compute ^ for two redshift 
ranges: z ~ ->• 0.8 (SDSS to DEEP2) and z ~ 0.8 -> -2.5 
(DEEP2 to Lensed galaxies). As a comparison, we also de- 
rive ^| from z ~ 0.8 to 2.5 using the DEEP2 and the Erb06 
samples (yellow circles/lines in Figure 3). We derive sepa- 
rate evolutions for different mass bins. We show our result in 
Figure 3. 

A positive metallicity evolution, i.e., metals enrich galaxies 
from high-z to the local universe, is robustly found in all mass 
bins from z — 0.8 — > 0. This positive evolution is indicated 
by the negative values of ^| (or dz ^Q yr ^ ) in Figure 3. The 
negative signs (both mean and median) of ^| are significant 
at >5 a of the measurement errors from z — 0.8 — > 0. From 
z — 2.5 to 0.8, however, ^| is marginally smaller than zero at 
the — 1 a level from the Lensed — > DEEP2 samples. If using 
the Erb06 ->■ DEEP2 samples, the metallicity evolution (^| ) 
from z ~ 2.5 to 0.8 is consistent with zero within —1 a of the 
measurement errors. The reason that there is no metallicity 
evolution from the z — 2 Erb06 — > z — 0.8 DEEP2 samples 
may be due to the UV-selected sample of Erb06 being biased 
towards more metal-rich galaxies. 

The right column of Figure 3 is used to interpret the de- 
celeration/acceleration in metal enrichment. Deceleration 
means the metal enrichment rate ( dz(Gyr) = ^ ^ex G vr i s 
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TABLE 2 

Median/Mean Redshift and Metallicity of the Samples 



Sample Redshift 



> lO'M (all) 



Metallicity (12 + log(Q/H)) 

> !O a M io»-»-°M IcF 



Mean 



-SDSS 0.071±().()16 

DEEP2 0.782±0.018 
Erb()6 2.26±0.17 
Lensed 1.91 ±0.63 



3.589±0.001 
S.459±0.004 
B.418±0.051 
3.274±0.045 



5.616 ±0.001 
8.464±0.004 
8.418±0.05() 
8.309±0.049 



8.475±0.002 
8.373±0.006 
8.265±0.046 
8.296±0.090 



8.666±0.001 
8.512±0.005 
8.495±0.030 
8.336±0.066 



;.589±0.001 
5.425±0.004 
!.316±0.052 
!.313±0.083 



8.731±0.001 
8.585±0.006 
8.520±0.028 
8.309±0.086 



Median 



-sdss omr 

DEEP2 0.783 
Erb()6 

Lensed 2.07 



5.631 ±0.001 
3.465±0.005 
B.459±0.065 
3.286±0.059 



!.646 ±0.001 
5.472 ±0.006 
5.459 ±0.065 
5.335 ±0.063 



8.475±0.003 
8.362±0.009 
8.297±0.056 
8.303±0.106 



8.677±0.001 
8.537±0.008 
8.515±0.048 
8.346±().()85 



!.617±0.001 
!.421±0.008 
!.319±0.008 
!.313±0.083 



8.730±0.001 
8.614±0.006 
8.521±0.043 
8.379±0.094 



NOTE. — The errors for the redshift are the la standard deviation of the sample redshift distribution (not the a of the 
mean/median). The errors for the metallicity are the la standard deviation of the mean/median from bootstrapping. 



dropping from high-z to low-z. Using our lensed galax- 
ies, the mean rise in metallicity is 0.055 ± 0.014 dex Gyr -1 
for z ~ 2.5 ->• 0.8, and 0.022 ± 0.001 dex Gyr" 1 for 
z ~ 0.8 — > 0. The Mann-Whitney test shows that the mean 
rises in metallicity are larger for z ~ 2.5 — > 0.8 than for 
z ~ 0.8 -> Oata significance level of 95% for the high 
mass bins (1O 9 5 M <M* <10 n M Q ). For lower mass bins, 
the hypothesis that the metal enrichment rates are the same 
for z ~ 2.5 — > 0.8 and z ~ 0.8 — > can not be rejected 
at the 95% confidence level, i.e, there is no difference in the 
metal enrichment rates for the lower mass bin. Interestingly, 
if the Erb06 sample is used instead of the lensed sample, the 
hypothesis that the metal enrichment rates are the same for 
z ~ 2.5 — > 0.8 and z ~ 0.8 — > can not be rejected 
at the 95% confidence level for all mass bins. This means 
that statistically, the metal enrichment rates are the same for 
z ~ 2.5 — > 0.8 and z ~ 0.8 — > for all mass bins from the 
Erb06 DEEP2 ->• SDSS samples. 

The clear trend of the average/median metallicity in galax- 
ies rising from high-redshift to the local universe is not sur- 
prising. Observations based on absorption lines have shown 
a continuing fall in metallicity using the damped Lya absorp- 
tion (DLA) galaxies at higher redshifts (z ~ 2 — 5) (e.g., 
Songaila & Cowie 2002; Rafelski et al. 2012). There are sev- 
eral physical reasons to expect that high-z galaxies are less 
metal-enriched: (1) high-z galaxies are younger, have higher 
gas fractions, and have gone through less generations of star 
formation than local galaxies; (2) high-z galaxies may be 
still accreting a large amount of metal-poor pristine gas from 
the environment, hence have lower average metallicities; (3) 
high-z galaxies may have more powerful outflows that drive 
the metals out of the galaxy. It is likely that all of these mech- 
anisms have played a role in diluting the metal content at high 
redshifts. 

6.2. Comparison between the Zz Relation and Theory 

We compare our observations with model predictions from 
the cosmological hydrodynamic simulations of Dave et al. 
(2011a,b). These models are built within a canonical hier- 
archical structure formation context. The models take into 
account the important feedback of outflows by implement- 
ing an observation-motivated momentum-driven wind model 
(Oppenheimer & Dave 2008). The effect of inflows and merg- 
ers are included in the hierarchical structure formation of 
the simulations. Galactic outflows are dealt specifically in 
the momentum-driven wind models. Dave & Oppenheimer 



(2007) found that the outflows are key to regulating metallic- 
ity, while inflows play a second-order regulation role. 

The model of Dave et al. (2011a) focuses on the metal 
content of star-forming galaxies. Compared with the previ- 
ous work of Dave & Oppenheimer (2007), the new simula- 
tions employ the most up-to-date treatment for supernova and 
AGB star enrichment, and include an improved version of the 
momentum-driven wind models (the vzw model) where the 
wind properties are derived based on host galaxy masses (Op- 
penheimer & Dave 2008). The model metallicity in Dave 
et al. (2011a) is defined as the SFR-weighted metallicity of 
all gas particles in the identified simulated galaxies. This 
model metallicity can be compared directly with the metal- 
licity we observe in star-forming galaxies after a constant off- 
set normalization to account for the uncertainty in the abso- 
lute metallicity scale (Kewley & Ellison 2008). The offset is 
obtained by matching the model metallicity with the SDSS 
metallicity. Note that the model has a galaxy mass resolu- 
tion limit of M* ^10 9 M©. For the Zz plot, we normalize 
the model metallicity with the median SDSS metallicity com- 
puted from all SDSS galaxies >10 9 M Q . For the MZ rela- 
tion in Section 7, we normalize the model metallicity with the 
SDSS metallicity at the stellar mass of 10 10 M Q . 

We compute the median metallicities of the Dave et al. 
(2011a) model outputs in redshift bins from z = to z = 3 
with an increment of 0.1. The median metallicities with ler 
spread (defined as including 68% of the data) of the model at 
each redshift are overlaid on the observational data in the Zz 
plot. 

We compare our observations with the model prediction in 

3 ways: 

(1) We compare the observed median metallicity with the 
model median metallicity. We see that for the lower mass 
bins (10 9-9 5 , 10 9 ~ 10 M Q ), the median of the model metal- 
licity is consistent with the median of the observed metallic- 
ity within the observational errors. However, for higher mass 
bins, the model over-predicts the metallicity at all redshifts. 
The over-prediction is most significant in the highest mass bin 
of 10 10-11 M Q , where the Student's t-statistic shows that the 
model distributions have significantly different means than the 
observational data at all redshifts, with a probability of being a 
chance difference of < 10~ 8 , < 10~ 8 , 1.7%, 5.7% for SDSS, 
DEEP2, the Lensed, and the Erb06 samples respectively. For 
the alternative high-mass bin of 10 9 5-11 M©, the model also 
over-predicts the observed metallicity except for the Erb06 
sample, with a chance difference between the model and ob- 
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FIG. 3. — Cosmic metal enrichment rate (^7) in two redshift (cosmic time) epochs. ^| is defined as the slope of the Zz relation. Left coloumn shows ^ 
in unit of Adex per redshift whereas the right coloumn is in unit of Adex per Gyr. We derive for the SDSS to the DEEP2 (z ~ to 0.8), and the DEEP2 



to the Lensed (z ~ 0.8 to 2.0) samples respectively (black squares/lines). As a comparison, we also derive ^ from 2 



0.8 to 2.0 using the DEEP2 and the 



Erb06 samples (yellow circles/lines). Filled and empty squares are results from the mean and median quantities. The model prediction (using median) from the 
cosmological hydrodynamical simulation of Dave et al. (201 la) is shown in red stars. The second to fifth rows show in different mass ranges. The first row 
illustrates the interpretation of the ^ in redshift and cosmic time frames. A negative value of ^| means a positive metal enrichment from high-redshift to local 
universe. The negative slope of ^ versus cosmic time (right column) indicates a deceleration in metal enrichment from from high-z to low-z. 
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servations of < 10~ 8 , < 10~ 8 , 1.7%, 8.9%, 93% for SDSS, 
DEEP2, the Lensed, and the Erb06 samples respectively. 

(2) We compare the scatter of the observed metallicity (or- 
ange error bars on Zz plot) with the scatter of the models (red 
dashed lines). For all the samples, the la scatter of the data 
from the SDSS (z ~ 0), DEEP2(z ~ 0.8), and the Lensed 
sample (z ~ 2) are: 0.13, 0.15, and 0.15 dex; whereas the 
lcr model scatter is 0.23, 0.19, and 0.14 dex. We find that the 
observed metallicity scatter is increasing systematically as a 
function of redshift for the high mass bins whereas the model 
does not predict such a trend: 0.10, 0.14, 0.17 dex c.f. model 
0.17, 0.15, 0.12 dex; lO 95 " 11 M Q and 0.07, 0.12, 0.18 dex 
c.f. model 0.12, 0.11, 0.10 dex ; lO 10 " 11 M Q from SDSS 
DEEP2 — > the Lensed sample. Our observed scatter is in tune 
with the work of Nagamine et al. (2001) in which the pre- 
dicted stellar metallicity scatter increases with redshift. Note 
that our lensed samples are still small and have large mea- 
surement errors in metallicity (~ 0.2 dex). The discrepancy 
between the observed scatter and models needs to be further 
confirmed with a larger sample. 

(3) We compare the observed slope (^|) of the Zz plot with 
the model predictions (Figure 3). We find the observed ^| is 
consistent with the model prediction within the observational 
errors for the undivided sample of all masses >10 9 M Q . 
However, when divided into mass bins, the model predicts 
a slower enrichment than observations from z ~ —> 0.8 for 
the lower mass bin of 10 9 ~ 9 - 5 M Q , and from z ~ 0.8 — > 2.5 
for the higher mass bin of 10 9 - 5-11 M Q at a 95% significance 
level. 

Dave et al. (201 1) showed that their models over-predict the 
metallicities for the highest mass galaxies in the SDSS. They 
suggested that either (1) an additional feedback mechanism 
might be needed to suppress star formation in the most mas- 
sive galaxies; or (2) wind recycling may be bringing in highly 
enriched material that elevates the galaxy metallicities. It is 
unclear from our data which (if any) of these interpretations 
is correct. Additional theoretical investigations specifically 
focusing on metallicities in the most massive active galaxies 
are needed to determine the true nature of this discrepancy. 



7. EVOLUTION OF THE MASS-METALLICITY RELATION 

7. 1. The Observational Limit of the Mass -Metallicity 
Relation 

For the N2 based metallicity, there is a limiting metallicity 
below which the [N II] line is too weak to be detected. Since 
[Nil] is the weakest of the Ha+[Nll] lines, it is therefore 
the flux of [Nil] that drives the metallicity detection limit. 
Thus, for a given instrument sensitivity, there is a region on 
the mass-metallicity relation that is observationally unobtain- 
able. Based on a few simple assumptions, we can derive the 
boundary of this region as follows. 

Observations have shown that there is a positive correlation 
between the stellar mass M* and SFR (Noeske et al. 2007b; 
Elbaz et al. 2011; Wuyts et al. 2011). One explanation for 
the M± vs. SFR relation is that more massive galaxies have 
earlier onset of initial star formation with shorter timescales 
of exponential decay (Noeske et al. 2007a; Zahid et al. 2012). 
The shape and amplitude of the SFR vs. M± relation at dif- 
ferent redshift z can be characterized by two parameters 5{z) 
and j(z), where S(z) is the logarithm of the SFR at 10 10 M* 
and 7(2) is the power law index (Zahid et al. 2012). 
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FIG. 4. — SFR vs. stellar mass relation. The light-blue, blue, and red lines 
show the best-fit SFR vs. stellar mass relation from the SDSS, DEEP2, and 
Erb06 samples respectively ((Zahid et al. 2011), see also Table 3). Back dots 
are the lensed sample used in this work. The SFR for the lensed sample is 
derived from the Ho flux with dust extinction corrected from the SED fitting. 
The errors on the SFR of the lensed sample are statistical errors of the Ha 
fluxes. Systematic errors of the SFR can be large (a factor of 2-3) for our 
lensed galaxies due to complicated aperture effects (Section 3.1). 

The relationship between the SFR and M* then becomes: 

\og w (SFR(z)) = S(z) +1 (z)[\og w (M,/M G ) - 10] (1) 

As an example, we show in Figure 4 the SFR vs. M* re- 
lation at three redshifts (z ~ 0, 0.8, 2). The best-fit values of 
6(z) and 7(2) are listed in Table 3. 

Using the Kennicutt (1998) relation between SFR and Ha : 

SFR = 7.9 x 10~ 42 L(Ha)[ergs s" 1 ] (2) 

and the N2 metallicity calibration (Pettini & Pagel 2004): 

12 + log(0/H) = 8.90 + 0.57 x log w [NII]/Ha, (3) 

we can then derive a metallicity detection limit. We combine 
Equations (1), (2) and (3), and assume the [N II] flux is greater 
than the instrument flux detection limit. We provide the de- 
tection limit for the PP04N2 diagnosed MZ relation: 

Zrnet > [k)g 10 (/ mst //i) + 2 log 1Q D L (z) - j(z) 

(4) 

M* - /3(z) + log 10 (47r)]0.57 + 8.9 

where: 

0(z) = S(z) - 7 (z)10 + 42 - log 10 7.9; (5) 

S(z), 7(2) are defined in Equation (1); fmst is the instrument 
flux detection limit in ergs s _1 cm -2 ; /i is the lensing mag- 
nification in flux; Dl(z) is the luminosity distance in cm. 

The slope of the mass-metallicity detection limit is related 
to the slope of the SFR-mass relation, whereas the y-intercept 
of the slope depends on the instrument flux limit (and flux 
magnification for gravitational lensing), redshift, and the y- 
intercept of the SFR-mass relation. 

Note that the exact location of the boundary depends on 
the input parameters of Equation 8. As an example, we use 
the S(z) and 7(2;) values of the Erb06 and Lensed samples 
respectively (Table 3). We show the detection boundary for 
three current and future NIR instruments: Subaru/MOIRCS, 
KECK7NIRSPEC and JWST/NIRSpec. The instrument flux 
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FIG. 5. — The instrument detection limit on the MZ relation. We give the dependence of this detection limit in Equation 4. Shown here are examples of 
the detection limit based on given parameters specified as follows. The solid lines use the parameters based on the mass-SFR relation of the Erb06 sample: 
S = 1.657 and 7 = 0.48 at z = 2.26. The dashed lines use the parameters based on the mass-SFR relation of the Lensed sample: 8 = 2.02 and 7 = 0.69 
at 2 = 2.07 (see Figure 4; Table 3). The parameters adopted for the instrument flux limit are given in Section 7.1. The lensing magnification (/i) are fixed 
at 1.0 (i.e., non-lensing cases) for Subaru/MOIRCS (blue lines) and IWST/NIRSpec (light blue). The red lines show the detection limits for KECK/NIRSPEC 
with different magnifications. Black filled triangles show the Erb et al. (2006) sample. We show that stacking and/or lensing magnification can help to push the 
observational boundary of the MZ relation to lower mass and metallicity regions. For example, Erb et al. (2006) used stacked NIRSPEC spectra with TV ~ 15 
spectra in each mass bin. The effect of stacking (N ~ 15 per bin) is similar to observing with a lensing magnification of fi ~ 4. 



TABLE 3 

Fit to the SFR-Stellar Mass Relation 



Sample 



Redshift (Mean) 
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SDSS 

DEEP2 

Erb06 

Lensed (Wuytsl2) 
Lensed (all) 



0.072 0.317±0.003 0.71 ±0.01 

0.78 0.795±0.009 0.69±0.02 

2.26 1.657±0.027 0.48±0.06 

1.69 2.93±1.28 1.47±0.14 

2.07 2.02±0.83 0.69±0.09 



NOTE. — The SFR vs. stellar mass relations at different redshifts can be 
characterize by two parameters 8{z) and 7(2), where 8 (z) is the logarithm of 
the SFR at 10 10 M*, and 7(2) is the power law index. The best fits for the non- 
lensed samples are adopted from Zahid et al. (2012). The best fits for the lensed 
sample are calculated for the Wuyts et al. (2012) sample and the whole lensed 
sample separately. 

detection limit is based on background limited estimation in 
10 5 seconds (flux in units of 10~ 18 ergs s _1 cm~ 2 below). For 
Subaru/MOIRCS (low resolution mode, HK500), we adopt 
finst = 23.0 based on the lcr uncertainty of our MOIRCS 
spectrum (flux=4.6 in 10 hours), scaled to 3a in 10 5 seconds. 
For KECK/NIRSPEC, we use f mst = 12.0, based on the la 
uncertainty of Erb et al. (2006) (flux=3.0 in 15 hours), scaled 
to 3a in 10 5 seconds. For JWST/NIRSpec, we use fi ns t = 
0.17, scaled to 3a in 10 5 seconds 9 . 

Since lensing flux magnification is equivalent to lowering 
the instrument flux detection limit, we see that with a lens- 

9 http://www.stsci.edu/jwst/instruments/nirspec/sensitivity/ 



ing magnification of ^55, we reach the sensitivity of JWST 
using KECK/NIRSPEC. Stacking can also push the observa- 
tions below the instrument flux limit. For instance, the z ~ 2 
Erb et al. (2006) sample was obtained from stacking the NIR- 
SPEC spectra of 87 galaxies, with ~ 15 spectra in each mass 
bin, thus the Erb06 sample has been able to probe ~ 4 times 
deeper than the nominal detection boundary of NIRSPEC. 

The observational detection limit on the MZ relation is im- 
portant for understanding the incompleteness and biases of 
samples due to observational constraints. However, we cau- 
tion that the relation between Z met and M* in Equation 4 will 
have significant intrinsic dispersion due to variations in the 
observed properties of individual galaxies. This includes scat- 
ter in the M*-SFR relation, the N2 metallicity calibration, the 
amount of dust extinction, and variable slit losses in spectro- 
scopic observations. For example, a scatter of 0.8 dex in S for 
the lensed sample (Table 3) implies a scatter of approximately 
0.5 dex in Z me f In addition, Equations 2 and 4 include im- 
plicit assumptions of zero dust extinction and no slit loss, such 
that the derived line flux is overestimated (and Z met is under- 
estimated). Because of the above uncertainties and biases in 
the assumptions we made, Equation 4 should be used with due 
caution. 

7.2. The Evolution of the MZ Relation 

Figure 6 shows the mass and metallicity measured from the 
SDSS, DEEP2, and our lensed samples. The Erb et al. (2006) 
(Erb06) stacked data are also included for comparison. We 
highlight a few interesting features in Figure 7: 
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(1) To first order, the MZ relation still exists at z ~ 2, 
i.e., more massive systems are more metal rich. The Pearson 
correlation coefficient is r = 0.33349, with a probability of 
being a chance correlation of P = 17%. A simple linear fit 
to the lensed sample yields a slope of 0.164±0.033, with a 
y-intercept of 6.8±0.3. 

(2) All z > 1 samples show evidence of evolution to lower 
metallicities at fixed stellar masses. At high stellar mass 
(M* >10 10 M Q ), the lensed sample has a mean metallicity 
and a standard deviation of the mean of 8.41±0.05, whereas 
the mean and standard deviation of the mean for the Erb06 
sample is 8.52±0.03. The lensed sample is offset to lower 
metallicity by 0. 1 1 ±0.06 dex compared to the Erb06 sample. 
This slight offset may indicate the selection difference 
between the UV-selected (potentially more dusty and metal 
rich) sample and the lensed sample (less biased towards UV 
bright systems). 

(3) At lower mass (M* <10 94 M ), our lensed sample 
provides 12 individual metallicity measurements at z > 1. 
The mean metallicity of the galaxies with M* <10 9 4 M is 
8.25±0.05, roughly consistent with the <8.20 upper limit of 
the stacked metallicity of the lowest mass bin (M+ ^lO 9 - 1 
M ) of the Erb06 galaxies. 

(4) Compared with the Erb06 galaxies, there is a lack of 
the highest mass galaxies in our lensed sample. We note 
that there is only 1 object with M* >10 10 ' 4 M among 
all three lensed samples combined. The lensed sample 
is less affected by the color selection and may be more 
representative of the mass distribution of high-z galaxies. In 
the hierarchical galaxy formation paradigm, galaxies grow 
their masses with time. The number density of massive 
galaxies at high redshift is smaller than at z <~ 0, thus 
the number of massive lensed galaxies is small. Selection 
criteria such as the UV-color selection of the Erb06 and 
SINs (Genzel et al. 2011) galaxies can be applied to tar- 
get the high-mass galaxies on the MZ relation at high redshift. 

7.3. Comparison with Theoretical MZ Relations 

Understanding the origins of the MZ relation has been the 
driver of copious theoretical work. Based on the idea that 
metallicities are mainly driven by an equilibrium among stel- 
lar enrichment, infall and outflow, Finlator & Dave (2008) de- 
veloped smoothed particle hydrodynamic simulations. They 
found that the inclusion of a momentum-driven wind model 
(vzw) fits best to the z ~ 2 MZ relations compared to 
other outflow/wind models. The updated version of their vzw 
model is described in detail in Dave et al. (2011a). We over- 
lay the Dave et al. (2011a) vzw model outputs on the MZ 
relation in Figure 7. We find that the model does not repro- 
duce the MZ redshift evolution seen in our observations. We 
provide possible explanations as follows. 

Kewley & Ellison (2008) found that both the shape and 
scatter of the MZ relation vary significantly among differ- 
ent metallicity diagnostics. This poses a tricky normaliza- 
tion problem when comparing models to observations. For 
example, a model output may fit the MZ relation slope from 
one strong-line diagnostic, but fail to fit the MZ relation from 
another diagnostic, which may have a very different slope. 
This is exactly what we are seeing on the left panel of Fig- 



ure 7. Dave et al. (2011a) applied a constant offset of the 
model metallicities by matching the amplitude of the model 
MZ relation at z <~ with the observed local MZ relation of 
Tremonti et al. (2004, T04) at the stellar mass of 10 10 M Q . 
Dave et al. (2011a) found that the characteristic shape and 
scatter of the MZ relation from the vzw model matches the 
T04 MZ relation between 10 9 M <M* <10 110 within 
the ler model and observational scatter. However, since both 
the slope and amplitude of the T04 SDSS MZ relation are 
significantly larger than the SDSS MZ relation derived using 
the PP04N2 method (Kewley & Ellison 2008), the PP04N2- 
normalized MZ relation from the model does not recover the 
local MZ relation within la. 

In addition, the stellar mass measurements from different 
methods may cause a systematic offsets in the x-direction of 
the MZ relation (Zahid et al. 201 1). As a result, even though 
the shape, scatter, and evolution with redshifts are indepen- 
dent predictions from the model, systematic uncertainties in 
metallicity diagnostics and stellar mass estimates do not al- 
low the shape to be constrained separately. 

In the right panel of Figure 7, we allow the model slope (a), 
metallicity amplitude (Z), and stellar mass (M*) to change 
slightly so that it fits the local SDSS MZ relation. As- 
suming that this change in slope (Aa), and x, y amplitudes 
(AZ, AM*) are caused by the systematic offsets in observa- 
tions, then the same Aa, AZ, and AM* can be applied to 
model MZ relations at other redshifts. Although normalizing 
the model MZ relation in this way will make the model lose 
prediction power for the shape of the MZ relation, it at least 
leaves the redshift evolution of the MZ relation as a testable 
model output. 

Despite the normalization correction, we see from Figure 7 
that the models predict less evolution from z ~ 2 to z ~ than 
the observed MZ relation. To quantify, we divide the model 
data into two mass bins and derive the mean and la scatter in 
each mass bin as a function of redshift. We define the "mean 
evolved metallicity" on the MZ relation as the difference be- 
tween the mean metallicity at redshift z and the mean metal- 
licity at z ~ at a fixed stellar mass (log (O/H) [z^0] — 
log (O/H) [z~2]). The "mean evolved metallicity" errors are 
calculated based on the standard errors of the mean. 

In Figure 8 we plot the "mean evolved metallicity" 
as a function of redshift for two mass bins: 10 
M <M* <10 9 5 M , 10 9 - 5 M <M* <10 n M . We cal- 
culate the observed "mean evolved metallicity" for DEEP2 
and our lensed sample in the same mass bins. We see that 
the observed mean evolution of the lensed sample are largely 
uncertain and no conclusion between the model and obser- 
vational data can be drawn. However, the DEEP2 data are 
well-constrained and can be compared with the model. 

We find that at z ~ 0.8, the mean evolved metallicity of 
the high-mass galaxies are consistent with the mean evolved 
metallicity of the models. The observed mean evolved metal- 
licity of the low-mass bin galaxies is ~ 0.12 dex larger than 
the mean evolved metallicity of the models in the same mass 
bins. 

8. COMPARE WITH PREVIOUS WORK IN LITERATURE 

In this Section, we compare our findings with previous 
work on the evolution of the MZ relation. 

For low masses (10 9 M ), we find a larger enrichment (i.e., 
smaller decrease in metallicity) between z <~ 2 — > than 
either the non-lensed sample of Maiolino et al. (2008) (0.15 
dex c.f. 0.6 dex) or the lensed sample of Wuyts et al. (2012); 
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FIG. 6. — Left: the observed MZ relation. Black symbols are the lensed galaxy sample at z > 1. Specifically, the squares are from this work; the stars are from 
Wuyts et al. (2012), and the diamonds are from Richard et al. (2011). The orange triangles show the Erb et al. (2006) sample. The local SDSS relation and its 
1-sigma range are drawn in purple lines. The z ~ 0.8 DEEP2 relations from Zahid et al. (201 1) are drawn in purple dots. Right: the best fit to the MZ relation. 
A second degree polynomial function is fit to the SDSSS, DEEEP2, and Erb06 samples. A simple linear function is fit to the lensed sample. The z > 1 lensed 
data are binned in 5 mass bins (symbol: red star) and the median and Icr standard deviation of each bin are plotted on top of the linear fit. 
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FIG. 7. — Model predictions of the MZ relation. The data symbols are the same as those used in Figure 6. The small green and light blue dots are the 
cosmological hydrodynamic simulations with momentum-conserving wind models from Dave et al. (2011a). The difference between the left and right panels 
are the different normalization methods used. The left panel normalizes the model metallicity to the observed SDSS values by applying a constant offset at 
Mstar ~ 10 10 Mq, whereas the right panel normalizes the model metallicity to the observed SDSS metallicity by allowing a constant shift in the slope, 
amplitude and stellar mass. Note that the model has a mass cut off at 1.1 X 10 9 Mq. 



Richard et al. (2011) (0.4 dex). These discrepancies may re- 
flect differences in metallicity calibrations applied. It is clear 
that a larger sample is required to characterize the true mean 
and spread in metallicities at intermediate redshift. Note that 
the lensed samples are still small and have large measurement 
errors in both stellar masses (0. 1 to 0.5 dex) and metallicity 
(~ 0.2 dex). 

For high masses (10 10 M Q ), we find similar enrichment (0.4 
dex) between z ~ 2 — > compare to the non-lensed sample 
of Maiolino et al. (2008) and the lensed sample of Wuyts et al. 
(2012); Richard et al. (201 1). 

We find in Section 6.1 that the deceleration in metal 
enrichment is significant in the highest mass bin (10 9 5 



M Q <M+ <10 n M©) of our samples. The deceleration in 
metal enrichment from z ~ 2 — >• 0.8 to z ~ 0.8 — > is 
consistent with the picture that the star formation and mass 
assembly peak between redshift 1 and 3 (Hopkins & Beacom 
2006). The deceleration is larger by 0.019±0.013 dex Gyr" 2 
in the high mass bin, suggesting a possible mass-dependence 
in chemical enrichment, similar to the "downsizing" mass- 
dependent growth of stellar mass (Cowie et al. 1996; Bundy 
et al. 2006). In the downsizing picture, more massive galaxies 
formed their stars earlier and on shorter timescales compared 
with less massive galaxies (Noeske et al. 2007a). Our obser- 
vation of the chemical downsizing is consistent with previous 
metallicity evolution work (Panter et al. 2008; Maiolino et al. 



redshift. The mean metallicity falls by ~ 0.18 dex from 
redshift to 1 and falls further by ~ 0.16 dex from red- 
shift 1 to 2. 

• A more rapid evolution is seen between z ~ 1 —> 3 
than z ~ — 5- 1 for the high-mass galaxies (10 9 5 
M <M* <10 11 M Q ), with almost twice as much en- 
richment between z ~ 1 — >• 3 than between z ~ 1 — > 0. 

• The deceleration in metal enrichment from z ~ 2 — > 
0.8 to z ~ 0.8 — > is significant in the high-mass 
galaxies (10 9 5 M Q <M* <10 n M Q ), consistent with 
a mass-dependent chemical enrichment. 




FIG. 8. — The "mean evolved metallicity" as a function of redshift for two 
mass bins (indicated by four colors). Dashed lines show the median and lc 
scatter of the model prediction from Dave et al. (201 la). The observed data 
from DEEP2 and our lensed sample are plotted as filled circles. 

2008; Richard et al. 201 1; Wuyts et al. 2012). 

We find that for higher mass bins, the model of Dave et al. 
(2011a) over-predicts the metallicity at all redshifts. The 
over-prediction is most significant in the highest mass bin of 
10 10 ~ n Mq. This conclusion similar to the findings in Dave 
et al. (201 la,b). In addition, we point out that when compar- 
ing the model metallicity with the observed metallicity, there 
is a normalization problem stemming from the discrepancy 
among different metallicity calibrations (Section 7.3). 

We note the evolution of the MZ relation is based on an en- 
semble of the averaged SFR weighted metallicity of the star- 
forming galaxies at each epoch. The MZ relation does not 
reflect an evolutionary track of individual galaxies. We are 
probably seeing a different population of galaxies at each red- 
shift (Brooks et al. 2007; Conroy et al. 2008). For example, a 
~10 10 5 Mq massive galaxy at z ^2 will most likely evolve 
into an elliptical galaxy in the local universe and will not ap- 
pear on the local MZ relation. On the other hand, to trace the 
progenitor of a ^lO 11 Mq massive galaxy today, we need to 
observe a ~10 95 M Q galaxy at z -2 (Zahid et al. 2012). 

It is clear that gravitational lensing has the power to probe 
lower stellar masses than current color selection techniques. 
Larger lensed samples with high-quality observations are re- 
quired to reduce the measurement errors. 

9. SUMMARY 

To study the evolution of the overall metallicity and MZ re- 
lation as a function of redshift, it is critical to remove the sys- 
tematics among different redshift samples. The major caveats 
in current MZ relation studies at z >1 are: (1) metallicity 
is not based on the same diagnostic method; (2) stellar mass 
is not derived using the same method; (3) the samples are se- 
lected differently and selection effects on mass and metallicity 
are poorly understood. In this paper, we attempt to minimize 
these issues by re-calculating the stellar mass and metallic- 
ity consistently, and by expanding the lens-selected sample at 
z > 1. We aim to present a reliable observational picture of the 
metallicity evolution of star forming galaxies as a function of 
stellar mass between < z < 3. We find that: 

• There is a clear evolution in the mean and median 
metallicities of star-forming galaxies as a function of 



• We compare the metallicity evolution of star-forming 
galaxies from z = — > 3 with the most recent 
cosmological hydrodynamic simulations. We see that 
the model metallicity is consistent with the observed 
metallicity within the observational error for the low 
mass bins. However, for higher mass bins, the model 
over-predicts the metallicity at all redshifts. The over- 
prediction is most significant in the highest mass bin of 
1Q10-11 Further theoretical investigation into the 
metallicity of the highest mass galaxies is required to 
determine the cause of this discrepancy. 

• The median metallicity of the lensed sample is 
0.35±0.06 dex lower than local SDSS galaxies and 
0.28±0.06 dex lower than the z ~ 0.8 DEEP2 galaxies. 

• Cosmological hydrodynamic simulation (Dave et al. 
2011a) does not agree with the evolutions of the ob- 
served MZ relation based on the PP04N2 diagnostic. 
Whether the model fits the slope of the MZ relation de- 
pends on the normalization methods used. 

This study is based on 6 clear nights of observations on a 
8-meter telescope, highlighting the efficiency in using lens- 
selected targets. However, the lensed sample at z > 1 is still 
small. We aim to significantly increase the sample size over 
the years. 
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APPENDIX 

SLIT LAYOUT, SPECTRA FOR THE LENSED SAMPLE 

This section presents the slit layouts, reduced and fitted spectra for the newly observed lensed objects in this work. The line 
fitting procedure is described in Section 3.2. For each target, the top panel shows the HST ACS 475W broad-band image of the 
lensed target. The slit layouts with different positional angles are drawn in white boxes. The bottom panel(s) show(s) the final 
reduced ID spectrum(a) zoomed in for emission line vicinities. The black line is the observed spectrum for the target. The cyan 
line is the noise spectrum extracted from object-free pixels of the final 2D spectrum. Tilted grey mesh lines indicate spectral 
ranges where the sky absorption is severe. Emission lines falling in these spectral windows suffer from large uncertainties in 
telluric absorption correction. The blue horizontal line is the continuum fit using first order polynomial function after blanking 
out the severe sky absorption region. The red lines overplotted on the emission lines are the overall Gaussian fit, with the blue 
lines show individual components of the multiple Gaussian functions. Vertical dashed lines show the center of the Gaussian 
profile for each emission line. The S/N of each line are marked under the emission line labels. Note that for lines with S/N <3, 
the fit is rejected and a 3-er upper limit is derived. 

Brief remarks on individual objects (see also Table 2 and 3 for more information): 

• Figure 9 and 10, Bll (888_351) : this is a resolved galaxy with spiral-like structure at z = 2.540 ± 0.006. As reported 
in Broadhurst et al. (2005), It is likely to be the most distant known spiral galaxy so far. Bll has 3 multiple images. We 
have observed Bl 1.1, and B11.2, with two slit orientations on each image respectively. Different slit orientation yields 
very different line ratios, implying possible gradients. Our IFU follow-up observations are in progress to reveal the details 
of this 2.6-Gyr-old spiral. 

• Figure 11 and 12, B2 (860_331): this is one of the interesting systems reported in Frye et al. (2007). It has 5 multiple 
images, and is only 2" away from another five-image lensed system, "The Sextet Arcs" at z=3.038. We have observed 
B2.1 and B2.2 and detected strong Ha and [Olll] lines in both of them, yielding a redshift of 2.537 ± 0.006, consistent 
with the redshift z = 2.534 measured from the absorption lines ([C II] A1334, [Si II] A1527) in Frye et al. (2007). 

• Figure 13, MSI (869_328): We have detected a 1-a [O III] line and determined its redshift to be z = 2.534 ± 0.010. 

• Figure 14, B29 (884.331): this is a lensed system with 5 multiple images. We observed B29.3, the brightest of the five 
images. The overall surface brightness of the B29.3 arc is very low, We have observed a 10-cr Ha and an upper limit for 
[N II], placing it at z = 2.633 ± 0.010. 

• Figure 15, G3: this lensed arc with a bright knot has no recorded redshift before this study. It was put on one of the extra 
slits during mask designing. We have detected a 8-cr [O III] line and determined its redshift to be z = 2.540 ± 0.010. 

• Figure 16, Ms-Jm7 (865 _359): We detected [Oil] H/3 [Olll] Ha and an upper limit for [Nil] placing it at redshift 

z = 2.588 ±0.006. 

• Figure 17 and 18, B5 (892_339, 870_346): it has three multiple images, of which we observed B5.1 and B5.3. Two slit 
orientations were observed for B5.1, the final spectrum for B5.1 has combined the two slit orientations weighted by the 
S/N of HaStrong Ha and upper limit of [N II] were obtained in both images, yielding a redshift of z = 2.636 ± 0.004. 

• Figure 19, G2 (894_332): two slit orientations were available for G2, with detections of H/3, [O III], Ha, and upper limits 
for [O II] and [N II]. The redshift measured is z = 1.643 ± 0.010. 

• Figure 20, B12: this blue giant arc has 5 multiple images, and we observed B12.2. It shows a series of strong emission 
lines, with an average redshift of z — 1.834 ± 0.006. 

• Figure 21, Lenszl. 36 (891_321): it has a very strong Ha and [Nil] is at noise level, from Ha we derive z — 1.363±0.010. 

• Figure 22, MSnewz3: this is a new target observed in Abell 1689, we detect [O n], H/3, and [O III] at a significant level, 
yielding z = 3.007 ± 0.003. 

• Figure 23, B8: this arc has five multiple images in total, and we observed B8.2, detection of [Oil], [Olll], Ha, with H/3 
and [N II] as upper limit yields an average redshift of z = 2.662 ± 0.006. 

• B22.3: a three-image lensed system at z = 1.703 ± 0.004, this is the first object reported from our LEGMS program, see 
Yuan & Kewley (2009). 

• Figure 24, A68-C27: this is the only object chosen from our unfinished observations on Abell 68. This target has many 
strong emission lines, z = 1.762 ± 0.006. The morphology of C27 shows signs of merger. IFU observation on this target 
is in process. 
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FIG. 9. — z=2.540, Bll.l MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. Note that the dashed box indicates the ~0.1 arcsec 
alignment error of MOIRCS. 
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Fig. 10.— 



z=2.540, Bl 1.2 MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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FIG. 1 1. — z=2.537, B2.1, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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Fig. 12.— 



z=2.537, B2.2, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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Arc Seconds 1,5x10" 1.4x10" 1.5x10" 1.6x10" 1.7x10* 1.8x10" 

Center: R.A. 13 11 28.70 Dec -01 19 42.7 Observed Wavelength (An,) 

FIG. 13. — z=2.54, MSI, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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FIG. 16. — z=2.588, Jm7, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 




FIG. 17.— z=2.641, B5.1, MOIRCS J. H band spectra. Note that the reason that the flux of B5.1 in slit position PAn60 is less than PA45 (B5.1+B5.2) is that 
the dithering length of PAn60 was smaller than the separation of 5.1 and 5.2, thus part of the flux of PA45 (B5.1+B5.2) was cancelled out during the dithering 
process. Detail descriptions are given in the Appendix text. 
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FIG. 19. — z=2.643, G2, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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Arc Seconds 
Center: R.A. 13 11 33.98 Dec -01 19 15.8 



FIG. 21. — z=1.363, Low-z, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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Fig. 23.~ 



z=2.663, B8.2, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 
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FIG. 24. — z=1.763, A68C27, MOIRCS J, H band spectra. Detail descriptions are given in the Appendix text. 



TABLE 4 
Measured Emission Line fluxes 



Id 


[OlI]A3727 


H/3 


[OlII] A5007 


Ha 


[Nil] A6584 


KK04(^PP04N2) 


Branch 


PP04N2 


E(B-V) a 


Final Adopted b 


Bll.l:pa20 
Bll.l:pa-20 
B11.2:pa-60 
B11.2:pa45 


21.43±2.60 
22.02±4.88 

<60.5 
73.65±12.01 


5.42±1.63 
9.40±2.75 
<64.6 
<34.23 


21.12±2.43 
14.78±2.21 

<60.3 
61.06±7.4 


33.54±2.38 
28.13±2.56 
53.6±4.09 
80.46±9.9 


<4.59 
8.90±0.89 
<17.12 
<13.08 


8.38(8.16)±0.14 
8.74(8.54)±0.14 

<8.74(8.54) 


up 
up 

up 


<8.41 
8.61 ±0.05 
<8.62 
<8.73 


0.73±0.29 
0.05±0.29 

0. 


8.48±0.18 c 


B2.1:pa20 
B2.2:pa20 


<2.82 
<7.73 


<6.3 
<20.6 


9.47±0.56 
23.2±3.0 


7.66±0.69 
<5.45 


<0.67 
<6.4 






<8.30 




<8.30 


MSl:pa20 


< 3.08 


<4.5 


6.7±0.9 
















B29.3:pa20 








17.05±1.6 


<3.1 






<8.48 




<8.48 


G3:pan20 




<4.0 


6.0±0.7 
















MS-Jm7:pan20 


19.16±2.74 


12.5±3.6 


58.12±1.64 


23.03±1.6 


<9.22 


8.69(8.33)±0.12 
8.23(8.19)±0.12 


up 
low 


<8.67 


0. 


8.25±0.18 


R5 3-nan?0 

B5.1:pan20 
B5.1:pa45 








9 07±0 7 
30.38±4.6 
64.39±3.9 


•?7 94 

<13.34 
<59.5 






<8.62 
<8.70 
<8.88 




<8.62 


G2:pan20 
G2:pan60 


<4.7 
<25.8 


6.84±0.74 
<8.8 


36.49±1.25 
<98.7 


10.09±1.0 


<3.1 


<8.62(8.41) 


up 


<8.6() 




<8.41 


Lowzl.36:pan60 








59.19±7.1 


<8.82 






<8.43 




<8.43 


MSnewz3:pa45 


36.94±11.5 


44.02± 11.06 


300.3±17.8 






8.5(8.29)±0.11 
8.12(8.16)±0.11 


up 
low 






8.23±0.18 


B12.2:pa45 


<71.58 


< 67.79 


141.01±10.07 


90.45±6.95 


< 10.6 






<8.369 c 




<8.369 


B8.2:pa45 


40.2±11.8 


<17.7 


75.7±6.6 


115.26±12.5 


<72.13 


<8.51(8.29) 
<8.1 1(8.16) 


up 
low 


<8.78 d 


>1.2 


<8.29 


B22.3:pa60 


162.1±20.3 


146.0±29.2 


942.3±62.8 


734.4±56.5 


<3.65 


8.13(8.17)±0.12 


low 


<8.22 


0.54±0.22 


8.10±0.18 



A68-C27:pa60 317.01±17.9 149.2±17.5 884.4±58.9 814.6±21.8 40.4±10.92 8.26(8.25)±0.06 low 8.16±0.07 0.62±0.11 8.16±0.18 



NOTE. — Observed emission line liuxes tor the lensed background galaxies in Albsy. fluxes are in units ot 1U 17 ergs s 1 cm 2 , without iensing magnification correction. Some lines are not detected because ot the severe telluric 
absorption. 

a E(B-V) calculated from Balmer decrement, if possible. 

b Final adopted metallicity, converted to PP04N2 base and extinction corrected using E(B-V) values from Balmer decrement if available, otherwise E(B-V) returned from SED fitting are assumed. 
c Based on NIRSPEC spectrum at KECK II (Kewley et al. 2013, in prep) 
^ Possible AGN contamination. 

e This galaxy shows significant [N It] /Ha ratios in slit position pa-20. The final metallicity is based on the average spectrum over all slit positions. 
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TABLE 5 

Physical Properties of the Lensed Sample 



ID1 


ID2 11 




RA, DEC (J2000) 


Redshift 


lg(SFR) b 
(Mq yr" 1 ) 


Lensing Magnification 
(flux) 


log(M*/M Q ) 


Bll.l:pa20 
B11.2:pa45 


888.351 


13 
13 


11:33.336, -01:21:06.94 
11:29.053, -01:20:01.26 


2.540±0.006 
2.540±0.006 


1.08±0.1 
1.42±0.11 


11.8±2.7 
13.1±1.8 


q 1+0-2 


B2.1:pa20 
B2.2:pa20 


860.331 


13 
13 


11:26.521, -01:19:55.24 
11:32.961, -01:20:25.31 


2.537±0.006 
2.537±0.006 


0.20±0.03 


20.6±1.8 
15.0±2.0 


°- Z -0.3 


MSl:pa20 


869.328 


13 


11:28.684,-01:19:42.62 


2.534+0.01 




58.3±2.8 




B29.3:pa20 


884.331 


13 


11:32.164,-01:19:52.53 


2.633+0.01 


0.43±0.06 


22.5±6.9 


9.0±° o f 


G3:pan20 




13 


11:26.219,-01:21:09.64 


2.540±0.01 




7.7±0.1 




MS-Jm7:pan20 


865.359 


13 


11:27.600,-01:21:35.00 


2.588±0.006 




18.5±3.2 


8.U_ 4 


B5.3:pan20 
B5.1:pan60 


892.339 
870.346 


13 
13 


11:34.109,-01:20:20.90 
11:29.064,-01:20:48.33 


2.636+0.004 
2.641 ±0.004 


0.47±0.05 
1.0±0.05 


14.2±1.3 
14.3±0.3 


a 1+0-4 
y - L -0.2 


G2 


894.332 


13 


11:34.730,-01:19:55.53 


1.643±0.01 


0.45±0.09 


16.7±3.1 


8.0l°; 3 4 


Lowzl.36 


891.321 


13 


11:33.957,-01:19:15.90 


1.363+0.01 


0.67±0.11 


11.6±2.7 




MSnewz3:pa45 




13 


11:24.276,-01:19:52.08 


3.007+0.003 


0.65±0.55 


2.9±1.7 


8.6±g : r 


B12.2:pa45 


863.348 


13 


11:27.212,-01:20:51.89 


1.834+0.006 


1.00±0.05 c 


56.0±4.4 


7 4+0.2 

'•^-o.o 


B8.2:pa45 




13 


11:27.212,-01:20:51.89 


2.662±0.006 


1.36±0.07 d 


23.7±3.0 


o 9+0. 5f 
s - z -0.6 


B22.3:pa60 e 




13 


11:32.4150,-01:21:15.917 


1.703±0.006 


1.88±0.04 


15.5±0.3 


8 5+ U - y 
8 -->-0 .2 


A68-C27:pa60 




00 


37:04.866,-1-09:10:29.26 


1.762±0.006 


2.46±0.1 


4.9±1.1 


ofi+01 

y -°-o.i 



NOTE. — The redshift errors in Table 5 is determined from RMS of different emission line centroids. If the RMS is smaller than 0.006 (for most targets) 
or if there is only one line fitted, we adopt the systematic error of 0.006 as a conservative estimation for absolute redshift measurements. 
a ID used in Richard et al. (2012, in prep). The name tags of the objects are chosen to be consistent with the Broadhurst et al. (2005) conventions if 
overlapping. 

b Corrected for lensing magnification, but without dust extinction correction. We note that the systematic errors of SFR in this work are extremely uncertain 
due to complicated aperture correction and flux calibration in the multi-slit of MOIRCS. 
c Based on NIRSPEC observation 

Possible AGN contamination. 
e See also Yuan & Kewley (2009) 

f The IRAC photometry for these sources are not included in the stellar mass calculation due to the difficulty in resolving the lensed image from the 
adjacent foreground galaxies. 
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